CHAPTER 18 A Yes-or-No Proposition: Logistic Regression 267

Don’t misinterpret odds ratios

for numerical predictors

The OR always represents the factor by which the odds of getting the outcome

event increases when the predictor increases by exactly one unit of measure,

whatever that unit may be. Sometimes you may want to express the OR in

more convenient units than what the data was recorded in. For the example in

Table 18-1, the OR for dose as a predictor of death is 1.0115 per REM. This isn’t too

meaningful because one REM is a very small increment of radiation. By raising

1.0115 to the 100th power, you get the equivalent OR of 3.1375 per 100 REMs, and

you can express this as, “Every additional 100 REMs of radiation more than triples

the odds of dying.”

The value of a regression coefficient depends on the units in which the corre-

sponding predictor variable is expressed. So the coefficient of a height variable

expressed in meters is 100 times larger than the coefficient of height expressed in

centimeters. In logistic regression, ORs are obtained by exponentiating the coef-

ficients, so switching from centimeters to meters corresponds to raising the OR

(and its confidence limits) to the 100th power.

Beware of the complete separation

problem

Imagine your logistic regression model perfectly predicted the outcome, in that

every individual positive for the outcome had a predicted probability of 1.0, and

every individual negative for the outcome had a 0 predicted probability. This is

called perfect separation or complete separation, and the problem is called the perfect

predictor problem. This is a nasty and surprisingly frequent problem that’s unique

to logistic regression, which highlights the sad fact that a logistic regression

model will fail to converge in the software if the model fits perfectly!

If the predictor variable or variables in your model completely separate the yes

outcomes from the no outcomes, the maximum likelihood method will try to make

the coefficient of that variable infinite, which usually causes an error in the soft-

ware. If the coefficient is positive, the OR tries to be infinity, and if it is negative,

it tries to be 0. The SE of the OR tries to be infinite, too. This may cause your CI to

have a lower limit of 0, an upper limit of infinity, or both.

Check out Figure 18-8, which visually describes the problem. The regression is

trying to make the curve come as close as possible to all the data points. Usually it

has to strike a compromise, because there’s a mixture of 1s and 0s, especially in

the middle of the data. But with perfectly separated data, no compromise is neces-

sary. As b becomes infinitely large, the logistic function morphs into a step func-

tion that touches all the data points (observe where b = 5).